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Abstract 

We present a review of recent works on the mathematical analysis of 
algorithms which have been proposed by A.F. Voter and co-workers in the 
late nineties in order to efficiently generate long trajectories of metastable 
processes. These techniques have been successfully applied in many con¬ 
texts, in particular in the field of materials science. The mathematical 
analysis we propose relies on the notion of quasi stationary distribution. 


1 Introduction 

This article is a review of recent works whose aim is to lay the mathematical 
foundations of algorithms used in computational statistical physics, namely 
accelerated dynamics techniques introduced by A.F. Voter and co-workers in 
the late nineties. These methods have been proposed in order to efficiently 
sample trajectories in the context of molecular dynamics. 

Molecular dynamics is used in various application fields (biology, chemistry, 
materials science) in order to simulate the evolution of a molecular system, 
namely interacting particles representing atoms or group of atoms. The typical 
dynamics one should have in mind is the Langevin dynamics; 

jdqt = M~^Ptdt 

\ dpt = - W {qt) dt - 'yM~^pt dt + \/2'yfd-^dWt 

where {qt,Pt) denotes the positions and momenta of the particles at time t > 0, 
M is the mass tensor, V is the potential function which, to a given set of posi¬ 
tions q, associates its energy V(q), 7 > 0 is a friction parameter, /3 = (/cbT)“^ is 
proportional to the inverse temperature and IJJ is a standard Brownian motion. 
In the following, we assume that qt G where d is typically very large (say 
3 times the number of particles) but generalizations to dynamics on manifolds 
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(systems with constraints) are straightforward. When 7 = 0, the Langevin 
dynamics is nothing but the Hamiltonian dynamics. The terms involving 7 
model the fact that the system is at a given temperature. Indeed, under loose 
assumptions on V, this dynamics is ergodic with respect to the canonical mea¬ 
sure (NVT ensemble): for any test function (^ : x —>■ M, 

Jq = Z~^ j (p{q,p)exp{-/3{p'^M-^p/2 + V{q)))dpdq 

where Z = f exp(—+ V{q)))dpdq < 00 . In the following, we will 
also consider the overdamped Langevin dynamics which is obtained from 0 in 
the limit 7 ^ oo or M —0 (see for example [9l Section 2.2.4]): 

dXt = -VV{Xt) dt + y/W^dWt (2) 

where Xt G denotes the position of the particles. Again, under loose as¬ 
sumptions on V, this dynamics is ergodic with respect to the canonical measure 
Z~^ exp(—/3H(x)) dx where Z = J exp(—/3H(x)) dx. The aim of molecular sim¬ 
ulations is to compute macroscopic properties from the models 0 or 0 at the 
atomistic level {V being the main modelling ingredient). In this article, we are 
particularly interested in so-called dynamical quantities, namely macroscopic 
observables which depend on the path {qt,Pt)t>o or {Xt)t>o- For example, one 
would like to sample the paths which go from one region of the phase space to 
another one, in order to compute the typical time to observe such transitions 
or the intermediate states along the transition path. 

The numerical difficulty associated with such computations is that the 
timescale at the microscopic level is much smaller than the timescale at the 
macroscopic level. More precisely, the timestep required to obtain a stable 
discretization of the above dynamics is much smaller than the timescale asso¬ 
ciated with the macroscopic observables of interest. In other words, one has 
to simulate very long trajectories of a high dimensional stochastic dynamics. 
In practice, for many applications, a naive discretization of the dynamics is 
not sufficient to reach the timescales of interest, since it would require up to 
typically 10 ^® iterations (the timescale at the atomistic level - bond vibration - 
is indeed of the order of 10 “^^ s, while transitions between metastable regions 
occur may over timescales ranging from microseconds to seconds). 

The idea of accelerated dynamics is to take benefit from this timescale dis¬ 
crepancy in order to simulate more efficiently paths over very long times. In¬ 
deed, the typical trajectories of Q or Q are metastable: this means that the 
trajectory remains trapped for very long times in some region of the phase 
space, before hopping to another region where it again remains trapped. These 
regions are called metastable states. Metastability originates from energetic 
barriers (the path to leave the state requires to climb above a saddle point of 
the potential energy V) or from entropic barriers (the path to leave the state 
goes through a narrow corridor, due to some steric constraints in the system 
for example), or more generally from a combination of energetic and entropic 
effects. The bottom line is thus that behind the continuous state space dy¬ 
namics Q or there is a discrete state space jump process (encoding the 
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jumps from metastable states to metastable states). Actually, discrete state 
space Markov dynamics are also very much used in molecular dynamics: there 
are called kinetic Monte Carlo or Markov state models, see for example m- 
And continuous state space models are typically used in order to parametrize 
these Markovian models (namely to compute the jump rates between metastable 
states) using for example Arrhenius (or Eyring-Kramers) formulas. The accel¬ 
erated dynamics of A.F. Voter follow a different path: the principle is to use the 
underlying jump process in order to accelerate the sampling of the original dy¬ 
namics, in the spirit of a predictor-corrector schemes. These are thus numerical 
methods to efficiently generate the underlying jump process among metastable 
states. 

In the following, we will assume that we are given a mapping 

S (3) 

which to a given set of positions x G associates S{x), the label of the state 
in which x lies. One should think of the states 

5“^({n}) = {x G M'^,5(x) = n} for re G N 

as the metastable states mentioned above. This mapping thus defines a par¬ 
tition of the state space. Let us make two comments on this mapping. First, 
an important message from the mathematical analysis we present below is that 
whatever the mapping S, the accelerated dynamics algorithms are consistent: 
they give the correct result in some limiting regime. For example for the parallel 
replica method, in the limit when the correlation time - a numerical parame¬ 
ter introduced below - goes to infinity, the generated dynamics are statistically 
correct. In particular if some of the states happen not to be metastable, or if 
for one specific realization, the stochastic process does not remain trapped in 
one of this state (because, for example, it enters the state with a too large ve¬ 
locity), the algorithms are still consistent. Second, as will become clear below, 
the numbering of the states do not need to be known a priori: the states are 
numbered as the simulation goes, when they are successively discovered by the 
stochastic process. 

Three algorithms have been proposed by A.F. Voter and co-workers. The 
idea is that if the stochastic process remains trapped for a very long time 
in a given state S = 5“^({re}) (for some given re), then there are ways to 
generate the exit event from this state much more efficiently than by running 
the original dynamics until the exit time. The exit event is fully characterized 
by two random variables: the exit time and the exit point from S, which are 
defined by considering the first hitting time and point on the boundary dS. Let 
us roughly describe the ideas behind the three algorithms. 

In the Parallel Replica method [TTj, the principle is to simulate in parallel 
many trajectories following the original dynamics Q or Q, to consider the 
first exit event among the replicas, and to generate from this first exit event a 
consistent exit time and exit point. The gain is thus obtained in terms of wall 
clock time. This algorithm can be seen as a way to parallelize a computation 
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in time, which is not an easy problem in general due to the sequential nature 
of time evolutions. 

In the hyperdynamics US], the idea is to modify the potential V within the 
state S in order to accelerate the exit from the state for the original dynamics Q 
or (§. Again, using an appropriate time rescaling, it is possible to generate from 
the observed exit event on the biased potential an exit event which is consistent 
with what would have been observed on the original unbiased potential V. 

The Temperature Accelerated Dynamics (TAD) |12j consists in considering 
the original dynamics 0 or 0 at a higher temperature than the original one. 
The idea is then that under appropriate assumptions, there is a way to infer 
from the exit events observed at high temperature the exit event which would 
have been observed at the original lower temperature. 

The ultimate aim of these three techniques is thus to generate efficiently 
the so-called state-to-state dynamics {‘S{qt))t>o (for Q) or {S{Xt))t>o (for Q), 
with the correct statistical properties. Let us emphasize that the objective is 
to get the correct law on the paths (in order to compute dynamical quantities), 
not only on the time marginals for example. 

A crucial mathematical tool to understand these techniques is the Quasi- 
Stationary Distribution (QSD) introduced in Section]^ We will then describe 
the mathematical results which have been obtained so far on the three algo¬ 
rithms: Parallel Replica in Section hyperdynamics in Section and Tem¬ 
perature Accelerated Dynamics in Section A few concluding remarks are 
provided in Section 

2 The Quasi-Stationary Distribution and the decor¬ 
relation step 

Let us consider a fixed state S = 5“^({n}) and let us focus for simplicity on the 
overdamped Langevin dynamics We assume that 5 is a bounded regular 
subset of Let us consider the first exit time from S: 

Ts = M{t >0,Xt^S}. 


2.1 The QSD 

Let us start with the definition of the QSD. 

Definition 1. A probability measure v with support in S is called a QSD for 
the Markov process {Xt)t>o if and only if 

, , L ¥(Xf e A, t< r?) vidx) 

yt > 0, VA c 5, 1 / A = \ * , - ’ , f;, A 

fsF{t<TI)u{dx) 

In other words, is a QSD if, when Xq is distributed according to v, the 
law of Xt conditionally to the fact that (A 5 )o<s<t remains in the state S is still 
u, for all positive t. 

The QSD satisfies three properties which will be crucial in the following. We 
refer for example to [6] for a proof of these results and to [1] for more general 
results on QSDs. 
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Proposition 2. Let {Xt)t>o follow the dynamics Q with an initial condition 
Xq G S. Then, there exists a probability distribution u with support in S such 
that 

lim C{Xt\Ts >t) = u. (4) 

t^OO 

The distribution v is the QSD associated with S. 


A consequence of this proposition is the existence and uniqueness of the 
QSD. The QSD can thus be seen as the longtime limit of the process conditioned 
to stay in the well. This proposition can be useful to understand what is a 
metastable state. A metastable state is a state such that the typical exit time 
is much larger than the local equilibration time, namely the time to observe the 
convergence to the QSD in Q. 

Let us now give a second property of the QSD. 

Proposition 3. Let L = —VP • V + be the infinitesimal generator of 

{Xt)t>o (satisfying Qj. Let us consider the first eigenvalue and eigenfunction 
associated with the adjoint operator L* = div (VP + /3“^V) with homogeneous 
Dirichlet boundary condition: 

{ L*ui = —Aiui on S, 
n 

ui = (J on Ob. 


The QSD v associated with S satisfies: 

^ ui{x) dx 
ui(x} dx 

where dx denotes the Lebesgue measure on S. 


The QSD thus has a density with respect to the Lebesgue measure, which is 
nothing but the ground state of the Fokker-Planck operator L* associated with 
the dynamics with absorbing boundary conditions. 

Finally, the last property of the QSD concerns the exit event, when Xq is 
distributed according to v. 


Proposition 4. Let us assume that Xq is distributed according to the QSD v 
in S. Then the law of the couple {Ts,Xts) (namely the first exit time and the 
first exit point) is fully characterized by the following properties: 

• Ts is exponentially distributed with parameter Ai (defined in Equation Q 
above) 


• Tg is independent of Xxg 


• The law of Xt^ is given by: for any bounded measurable function ip : 




¥T{p{Xts)) = 


JggpdnUida 
fiXi fg ui{x) dx 


( 6 ) 


where a denotes the Lebesgue measure on dS induced by the Lebesgue 
measure in and the Euclidean scalar product, and dnUi = Vtti • n 
denotes the outward normal derivative of ui on dS. 
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This Proposition explains the interest of the QSD. Indeed, if the process 
is distributed according to the QSD in S (namely, from Proposition if it 
remained for a sufficiently long time in S) , then the exit event from the state S 
is Markovian, in terms of state-to-state dynamics. This is reminiscent of what 
is assumed to build kinetic Monte Carlo models (see US]). 

Remark 5. The existence of the QSD and the convergence of the constrained 
process towards the QSD for the Langevin process Q requires extra work com¬ 
pared to the overdamped Langevin process (§. For results in that direction, we 
refer to the recent manuscript m 

2.2 The decorrelation step 

The accelerated dynamics algorithms will be based on the assumption that the 
process remained sufficiently long in the state S so that one can consider it 
is distributed according to the QSD u. Then, using this assumption, various 
techniques are used in order to efficiently generate the exit event from S, starting 
from the QSD (see the next sections). 

A natural preliminary question is therefore: how to assess in practice that 
the limit has been reached in Q ? This is done in the so-called decorrelation 
step which consists in waiting for a given time Tcorr (a so-called decorrelation 
time) before assuming that the local equilibrium n has been reached. This 
correlation time can be state dependent, and is typically supposed to be known 
a priori. 

From a mathematical viewpoint, Tcorr should be chosen sufficiently large so 
that the distance between the law of conditioned to Ts > Tcorr and the 

QSD n is small. In [^, we prove the following: 

Proposition 6. Let {Xt)t>o satisfies ([^ with Xq G S. Let us consider — A 2 < 
— Ai < 0 the first two eigenvalues of the operator L* on S with homogeneous 
Dirichlet boundary conditions on dS (see Proposition^ for the definition of 
L*). Then, there exists a constant C > 0 which depends on the law of Xq, such 
that, for all t> 

sup |E(/(rs-t,XTj|rs >t)-EQ/(rs,XrJ)| <C'exp(-(A 2 -Ai)Q. 
/> ll/llr°°<l 

In other words, the total variation norm between the law of {Ts—t, X^g) con¬ 
ditioned to Ts > t (for any initial condition Xq G S), and the law of {Ts,XTg) 
when Xq is distributed according to n, decreases exponentially fast with rate 
A 2 — Ai. This means that Tcorr should be chosen of the order 1/(A2 — Ai). Of 
course, this is not a very practical result since computing these eigenvalues is 
in general impossible. From a theoretical viewpoint, this result tells us that the 
local equilibration time is of the order 1/(A2 — Ai), so that, the state S will be 
metastable if this time is much smaller than the exit time (which is typically of 
the order of 1/Ai, see Proposition |^ . 

Let us mention the recent work [3| where we propose a numerical method to 
approximate the time to reach the QSD using a Fleming-Viot particle process 
together with stationarity statistical tests. The interest of the approach is 
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demonstrated on toy examples (including the 7 Lennard-Jones cluster), but it 
remains to test this technique on higher dimensional problems. 

From now on, we assume that the decorrelation step has been successful, 
and we look for efficient techniques to generate the exit event (namely a sam¬ 
ple of the random variables {Ts, Let us describe successively the three 

algorithms which has been proposed by A.F. Voter and co-workers. 

3 The Parallel Replica method 



Figure 1: The Parallel Replica method: many exit events are simulated in 
parallel, all starting from the QSD in S. The blue trajectory represents the 
reference walker which stays sufficiently long within S so that we can assume 
the blue point is distributed according to the QSD. The red points represent 
i.i.d. initial conditions distributed according to the QSD. The black trajectories 
are simulated in parallel. 

Let us assume that we are given an initial condition Xo G S such that Xo 
is distributed according to the QSD in S. Let us assume that we are given a 
computer with many CPUs in parallel. The idea of the parallel replica method 
is to distribute N independent initial conditions (VQ)i<j<jv in S according to 
the QSD v, to let them evolve according to 0 driven by independent motions 
(so that the replicas remain independent) and then to consider the first exit 
event among the replicas: 

Jo = arg min Tl where Th = inf{f > 0, XI 0 S}. (7) 

The integer Iq G {1,..., iV} is the index of the first replica which exits S, and 
min(rj,... ,Tg) = . The effective exit time is set as N times the first exit 

time, and the effective exit point is nothing but the exit point for the first exit 
event. See Figure for a schematic illustration of the method. 

The consistency of the method is a corollary of Proposition]^ Indeed using 
the fact that, starting from the QSD, the exit time is exponentially distributed 
and independent of the exit point, we easily obtain that 

iVr^° = N min(ri,..., T#) ^ Ti (8) 
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which means that the effective exit time has the correct law and 




which means that the first exit point of the replica Jq (the first one to exit 
among N) has the same law as the first exit point of any of them. Moreover 
NTg° = Nmin{Tg ,..., T^) and are independent, so that we have proven 

the following Lemma. 

Lemma 7. Let Iq be the index of the first replica exiting S, defined by 0. 
Then we have the equality in law: 



This Lemma shows that the parallel replica is exact: the law of the effective 
exit time and exit point is exactly the law of the exit time and exit point which 
would have been observed for only one replica. 

Let us make a few remarks on this algorithm. First, the full algorithm 
actually iterates three steps: 


The decorrelation step (see Section 2.2), where a reference walker is run 
following the dynamics Q until it remains trapped for a sufficiently long 
time in one of the sets 5~^({n}), so that is can be assumed to be dis¬ 
tributed according to the QSD u associated with this set. During this step, 
the algorithm thus consists simply in integrating the original dynamics. 
No error is introduced and there is no computational gain. 


The dephasing step which is a preparation step during which N inde¬ 
pendent initial conditions distributed according to v are sampled, each 
one on a different CPU. This is done in parallel. During this step, the 
simulation clock is stopped. This step is thus pure overhead. This step 
requires appropriate algorithms to sample the QSD such as rejection al¬ 
gorithm or Fleming-Viot particle systems (see 0 )- For example, the re¬ 
jection algorithm consists in running independently walkers following the 
dynamics ([^ (starting from a point within S) and to consider the final 
point of the trajectory conditionally to the fact that the walker remains 
in the state, for a sufficiently long trajectory (typically for the time Tcorr 
introduced in Section 2.2). 


• The parallel step, just described above, which consists in running the N 
replicas independently in parallel, and in waiting for the first exit event 
among the N replicas. The simulation clock is then updated by adding 
the effective exit time . The exit point is used as the initial 

condition of the reference walker for the next decorrelation step. The 
computational gain of the whole algorithm comes from this step which 
divides the wall clock time to sample the exit event by the number of 
replicas N. 




In practice, if the rejection algorithm is used in the dephasing step, one actually 
does not need to wait for the N replicas to be dephased to proceed to the parallel 
step, see mum- 

In view of the above discussions, the errors introduced in the algorithm 
have two origins. First, in the decorrelation step, Tcorr should be chosen suffi¬ 
ciently large so that at the end of the decorrelation step, the reference walker 
is indeed distributed according to a probability law sufficiently close to the 
QSD. The convergence result of Proposition shows that the error is of the 
order 0{exp{—Tcorr/{^2 — Second, in the dephasing step, the sampling 

algorithm of the QSD should be sufficiently precise in order to obtain i.i.d. sam¬ 
ples distributed according to u. For the rejection algorithm, independence is 
ensured, and the accuracy is again related to the convergence result of Propo¬ 
sition For Fleming-Viot particle process, some correlations are introduced 
among the replicas, and it is an open problem to evaluate the error introduced 
by these correlations. As already mentioned above, in [3], we recently proposed 
an algorithm to compute on-the-fly a good correlation time, while sampling the 
QSD, using a Fleming-Viot particle process. 

The parallel replica is thus a very versatile algorithm. In particular it applies 
to both energetic and entropic barriers. The only errors introduced in the 
algorithm are related to the rate of convergence of the conditioned process to 
the QSD. The algorithm will be all the more efficient than the convergence 
time to the QSD is small compared to the exit time (namely the states are 
metastable); in this case, the speed-up in terms of wall clock time is linear as 
a function of N. We refer to [3l Section 5.1] for a discussion of the parallel 
efficiency of the algorithm. 

Let us finally mention the recent work [2] where we propose an extension 
of the Parallel Replica algorithm to Markov chains (namely discrete-in-time 
stochastic processes). This is indeed a relevant question since in practice, the 
continous-in-time dynamics (such as 0 or Q) are approximated by discrete- 
in-time Markov Chains using appropriate time discretization schemes. The 
algorithm has to be slightly adapted since, starting from the QSD (which is 
still perfectly well defined in this context), the exit time is not exponentially 
distributed but has a geometric law. Therefore, the formula ^ does not hold 
and is replaced by the following: for T^,...,T'^ N i.i.d. random variables 
geometrically distributed, 

V(min(r\...,r^) - 1) -hmin(f € {!,.. .,iV}, T = min(T\ ..., T^)) = 

This yields a natural adaptation of the original Parallel Replica algorithm to 
Markov chains. 

4 The hyper dynamics 

Let us again assume that we are given an initial condition Xq G S such that Xq 
is distributed according to the QSD in S. In other words, let us assume that we 
are at the end of the decorrelation step: the reference walker stayed sufficiently 
long in S. 
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Figure 2: The hyperdynamics: the exit event is simulated on a biased potential 
in S. The honeycomb region represents the support of the biasing potential 5V. 

The principle of the hyperdynamics algorithm is then to raise the potential 
inside the state in order to accelerate the exit from S. The algorithm thus 
requires a biasing potential 5V : S' —)• M, which satisfies appropriate assumptions 
detailed below. The algorithm then proceeds as follows: 

• Equilibrate the dynamics on the biased potential V + 5V, namely run the 

dynamics Q on the process over the biased potential condition¬ 

ally to staying in the well, up to the time the random variable has 
distribution close to the QSD associated with the biased potential. 
This first step is a preparation step, which is pure overhead. The end 
point Xf^ will be used as the initial condition for the next step. 

• Run the dynamics Q over the biased potential V + 5V up to the exit 
time from the state S. The simulation clock is updated by adding 
the effective exit time BT^ where B is the so-called boost factor defined 


by 



(9) 


The exit point is then nsed as the starting point for a new decorrelation 
step. 

See Figure for a schematic illustration of the method. 

Roughly speaking, the assumptions required on 5V in the original paper |13j 
are twofold: 

• 5V is snfhciently small so that the exit event from the state S still sat¬ 
isfies the standard assumptions used for kinetic Monte Carlo models and 
transition state theory. 

• 6V is zero on (a neighborhood) of the boundary dS. 

The derivation of the method relies on explicit formnlas for the laws of the 
exit time and exit point, using the transition state theory. The aim of the 
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mathematical analysis presented below is to give a rigorous set of assumptions 
to make this algorithm consistent. 

The algorithm we present here is actually slightly different from the way 
it is introduced in the original paper m Indeed, in the original version, 
the local equilibration steps (decorrelation step and equilibration step on the 
biased potential) are omitted: it is assumed that the states are sufficiently 
metastable (for both the original potential and the biased potential) so that 
these local equilibrations are immediate. It would be interesting to check if the 
modifications we propose here improve the accuracy of the method. 

Let us now discuss the mathematical foundations of this technique, and 
in particular, a way to understand the formula ([^ for the boost factor. We 
actually need to compare two exit events. The first one is the exit event for 
the original process Xt following the dynamics starting from the QSD u 
associated with the state S and the dynamics with potential V. The second one 
is the exit event for the process following the dynamics Q on the biased 
potential V + 6V, starting from the QSD associated with the state S and 
the dynamics with potential V + 5V. Referring to Proposition comparing 
the two exit events amounts to understanding how the first eigenvalue Ai and 
the normal derivative of the first eigenvector dnUi are modified when changing 
the potential from V to V + 6V. Let us denote Ai(IL) (resp. Ai(P + 6V)) and 
dnUi{V} (resp. dnUi{V + hP)) the first eigenvalue and the normal derivative 
when considering the original potential V (resp. the biased potential V + 5V). 

In [ 8 ], we prove the following. 

Theorem 8. Let us make the following assumptions on V. We assume there 
exists an open set S~ such that S~ C S and: 

• Regularity.' V and Plas are Morse functions. 

• Localization in S~ of the eigenvectors associated with small eigenvalues.' 

(i) |VP| ^0 inS\S- ; 

(a) dnV > 0 on dS~ ; 

(Hi) mines' V > mingg- V ; 

(iv) ming 5 - P—cvmax > cvmax—min^- P where cvmsix = max{P(x),x s.t. |VP(x)| 

0 }. 

• Non degeneracy of exponentially small eigenvalues.' The critical values of 
V in S~ are all distinct and the differences V{y) — V(x) are all distinct, 
where x G ranges over the local minima ofV\s- and y G ranges 
over the critical points ofV\g- with index 1. 

Let us also assume that the biasing potential 6V is such that 

• P + 6V satisfies the same assumptions as the ones on V above ; 

• 5V = 0 on S\S~. 
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Then, there exists c > 0 such that, in the limit /3 —)> oo, 


.-hv 


(1 + Oie-f ^^)), 

dn[ui{V)] 




['“i(^)] llLi(as) 


(10) 

+ 0(e-^") inL\dS). (11) 


Ai(y + <5y)_ Jse 
Ai(y) fge-h(v+m 

dn[ui{V + 6V)] _ 

||9n [^tl(^ + <5^)] 11^1(95) I 

The proof is based on results from semi-classical analysis for boundary Wit¬ 
ten Laplacians. 

Notice that the formula (10) provides a justification of the formula (|^ for 
the boost factor. Indeed, by assuming that TJ^ is sufficiently large, we have by 
an ergodic property: 


B = 


rTlv 




TSV 


fgexp(^dV) exp(-^(V + <5I/)) 
f^exp(-/3(V + SV)) 
fgexp(-/3V) 


f^exp(-/3(V -f <5y)) 

Ai(V + SV) 


M(V) • 

By multiplying the exit time on the biased potential V + hV by j we 

indeed obtain (in law) the exit time on the original potential V. Moreover, 


the estimate (11) shows that (up to exponentially small errors in the limit of 
small temperature), the first exit point from S for the biased potential has the 
same distribution as the first exit point from S for the original potential (see 
Equation (j^ in Proposition]^. 

A practical aspect we do not discuss here at all is the effective construction 
of the biasing potential 5V. In the original article |13] . A.F. Voter proposes a 
technique based on the Hessian V^V. A well-known method in the context of 
materials science is the bond-boost method introduced in Uni- 

Notice that, contrary to the Parallel Replica method, the hyperdynamics is, 
at least for our mathematical analysis, limited to energetic barriers (see assump¬ 
tions (Hi) and (iv) in Theorem]^. On the other hand, for very high energetic 
barriers, hyperdynamics is in principle much more efficient: the Parallel Replica 
method only divides the exit time by N (the number of replicas), while for deep 
wells, the boost factor B is very large. 


5 The Temperature Accelerated Dynamics 

Let us finally introduce the Temperature Accelerated Dynamics (TAD), see |12j . 
Let us assume again that we are at the end of the decorrelation step: the 
reference walker stayed sufficiently long in S. The principle of TAD is to increase 
the temperature (namely increase /3“^ in ([^) in order to accelerate the exit from 
S. The algorithm consists in 
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Simulating many exit events from S at high temperature, starting from 
the QSD at high temperature, 


• Extrapolating the high temperature exit events to low temperature exit 
events using the Arrhenius law. 

As for the hyperdynamics algorithm, in the original paper jl 2 j . no equilibration 
step is used: it is assumed that the states are sufficiently metastable at both 
high and low temperatures so that the convergence to the QSD is immediate. 
Let us now describe more precisely how the extrapolation procedure is made. 



Figure 3: The Temperature Accelerated Dynamics: exit events are simulated at 


higher temperature and then extrapolated to the original smaller temperature. 


Let us consider the exit event from S, at a given temperature. The set S 
is surrounded by I neighboring states and let us denote by dSi the common 
boundary with th i-th neighboring state, i G The sets dSi thus form 

a partition of the boundary dS. Let us introduce, for i G { 1 ,..., /}, the saddle 
point Xi G dSi which is the lowest in energy on dSi'. in the small temperature 
regime, the paths leaving the state S through dSi will leave through a neigh¬ 
borhood of Xi (this can be inferred from results from the large deviation theory 
for example). Let us also denote by xq the global minimum of V on S. We 
refer to Figure for a schematic representation of the geometry. The interesting 
quantities to define the exit events are: 

• The probability to exit through dSi which writes (see Proposition]^: 



p{i) = P(Xrs G dSi) = 

/3Ai / ui{x)dx 

Js 


• And the parameter of the exponential random variable T 5 : 

Ai = l/E{Ts). 


Notice that one way to rewrite the exit event is to attach to each exit regions 
dSi (or to each saddle point xQ a rate 


k{i) = Xipit) 
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and to consider I independent exponential random variables r* with parameter 
k(i). The exit event is then given by 

• the exit time min(ri, . .. ,ti) = Ts 

• and the exit region argmin(ri,..., r/), since , for i e {1 ,..., /}, P(argmin(ri, ... ,ti) 
i) =p{i). 

This description of the exit event in terms of rates attached to neighboring sad¬ 
dle points is exactly what is used for kinetic Monte Carlo models [15] . The TAD 
algorithm requires an approximation of the rate k{i), namely the Arrhenius law: 

k{i) = \ip{i) ~ Pi exp(-/3(D(x*) - V(xq))) (12) 


where pi is independent of /3. In the original paper [T2|, the Arrhenius law is 
justified using the (harmonic) transition state theory. 

Let us now go back to the TAD algorithm. Using the underlying kinetic 
Monte Carlo model presented above, and using the Arrhenius law (12), one 
observes that: 


k^\i) _ 

kf‘°{i) \^°p^°{i) 


c.exp(-(/3'*'-/3‘°)(U(xi)-U(xo))) 


dIo 


(13) 


where, with obvious notation, (3^° denotes the inverse low temperature, the 
inverse high temperature, and the superscripts lo and hi refer to the associ¬ 
ated quantities respectively at low and high temperature. The extrapolation 


formula (13) is used in TAD in order to infer the exit event at low temperature 


from the exit events observed at high temperature, by using the formula: 


(T'»,....Tr) = (e‘Tf 


lo\ ^ 


\l^hi 


.eV) 


(14) 


where 


0 * = 


k^^{i) 

k^°{i) 


exp(-(/3^*-n(U(xi)-U(xo))) 


jlo 


is a multiplicative factor constructed from the ratio of the rates (13). In the 
equality in law in (14) the random variables are, as described above, expo¬ 
nential random variables with parameter k^''^^°{i). To have analytical formula 
for the correction factors 0j and make the algorithm practical, the Arrhenius 
is assumed to be exact and one uses in practice 0* = exp(—(/3^* — [3^°){V (xj) — 
V{xo))). 

The TAD algorithm thus consists in running the dynamics at high temper¬ 
ature, observing the exit events through the saddle points on the boundary of 
the state, and updating the exit time and exit region that would have been 
observed at low temperature. More precisely, if exits through the saddle points 
{si,..., Sfc} C { 1 ,..., /} have been observed, one computes min(0^iT^j®,..., 




to get the exit time and the exit region at low 


and argmin(0® 
temperatnre. 

The interest of TAD compared to a brute force saddle point search is that it 
is not required to observe exits through all the saddle points in order to obtain 
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a statistically correct exit event. Indeed, a stopping criterium is introduced to 
stop the calculations at high temperature when the extrapolation procedure will 
not modify anymore the low temperature exit event (namely will not modify 
min(0*ir^^*,..., {si,..., Sk} C {1,..., being the saddle points dis¬ 

covered up to the stopping time). This stopping criterium requires to provide 
some a priori knowledge, typically a lower bound on the barriers V (xj) — V (xq) 
(zG {!,...,/}) (there is also a variant using a lower bound on the prefactors rji 
in (|12[)). In some sense, TAD can be seen as a clever saddle point search, with 
a rigorous way to stop the searching procedure. 

If the Arrhenius law (12) is exactly satisfied, one can check that the TAD 
algorithm simulates an exit event which is statistically exact. The mathematical 
question raised by this algorithm is thus to estimate the difference between the 


ratio of the rates 




and the estimate deduced from the Arrhenius law 


exp(—(/I^* — l3’‘°){V{xi) — D(xo))) (see the extrapolation formula © above). 
In H], we consider as a first step the case of a one dimensional potential, where 
S is a single well, and we prove that in the limit —)> oo with 

fixed. 


^hi 


■ /^hi 


Xlo 


’P 


lo 


= g-(/9“-/3'°)(y(xi)-y(xo)) 


1 1 


(15) 


The extension of this result to a high dimensional setting is a work under 
progress. 

Notice that, compared to the hyperdynamics, TAD is based on an additional 
assumption, namely the Arrhenius law. We expect that this implies larger error 
for TAD than for the hyper dynamics: for the hyperdynamics the error in (10)- 
0 is exponentially small in the limit /3 —)> oo, while for TAD, the error scales 
like 1//3 in (15). The interest of TAD compared to the hyperdynamics is that 
it does not require a biasing potential, which may be complicated to build in 
some situation. 


6 Conclusion and discussion 

We presented three algorithms which have been proposed by A.F. Voter and co¬ 
workers in order to efficiently generate the state-to-state dynamics associated 
with a metastable stochastic process. 

We proposed an analysis of these algorithms based on the notion of quasi- 
stationary-distribution (QSD). As explained above, starting from the QSD 
within a well, the exit event is Markovian since the exit time is exponentially 
distributed and independent of the next visiting state. From a theoretical view¬ 
point, the QSD thus seems to be an interesting intermediate to relate Markovian 
dynamics in continuous state space (such as the Langevin ([^ or the overdamped 
Langevin dynamics) with Markovian dynamics in discrete state space (ki¬ 
netic Monte Carlo models or Markov state models). It also gives a natural 
definition of a metastable region (see 0) as a region where the stochastic pro¬ 
cess reaches local equilibrium (namely the QSD) before exiting. 

Going from Parallel Replica to Hyperdynamics to TAD, the assumptions 
required for the algorithm to be correct are more and more stringent. Indeed, 
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for Parallel Replica, no assumptions is required beyond the fact that a good 
estimate of Tcorr is available (to assess the convergence of the QSD). Hyper¬ 
dynamics requires additional assumptions on the potential: the metastability 
of the state should come from energetic barriers (at least for our mathemat¬ 
ical analysis to apply). Finally, the TAD algorithm requires in addition the 
Arrhenius law to be satisfied. Likewise, going from Parallel Replica to Hyper¬ 
dynamics to TAD, the errors introduced by the algorithm are expected to be 
larger and larger. On the other hand, the expected computational gain with 
Parallel replica is typically smaller than for the two other methods. In addi¬ 
tion, comparing hyperdynamics with TAD, the interest of TAD is that it does 
not require the construction of a biasing potential, which is a difficult task in 
general. 

One practical aspect we did not discuss above is the choice of the partition of 
the configurational space into states (in other words, the choice of the function 
S). Let us focus on the Parallel Replica algorithm for simplicity. We already 
mentioned that thanks to the decorrelation step, the algorithm is consistent 
whatever the choice of the partition. However, the efficiency of the algorithm 
highly depends on the choice of the partition: the states should be metastable 
regions, so that the stochastic process reaches the local equilibrium (the QSD) 
before leaving the state. How to design a good partition is a difficult question. 
For a system with high energy barriers (this is often the case for application in 
materials science for example), the original idea of A.F. Voter and co-workers 
is to define the states as the basins of attraction of the local minima of W for 
the simple gradient dynamics x = — W(x). For a system with more diffusive 
or entropic barriers (this is typically the case for biological applications), one 
could think of defining the states using relevant reaction coordinates (see for 
example [5] where the states are defined in terms of the molecular topology of 
the molecule of interest). Notice that choosing a good partition also implies 
being able to estimate the correlation time within each state either from some 
a priori knowledge, or some on-the-fly estimates [3]. 

Let us finally mention the mathematical questions raised by these algorithms 
and that we would like to investigate in the future. 

Concerning the TAD algorithm, the analysis of the validity of the Arrhe¬ 
nius law starting from the QSD has only been done for the moment in a one¬ 
dimensional situation. The extension to a more general setting is a work under 
progress. 

We would like to stress that all these algorithms are used in practice with the 
Langevin dynamics Q . The mathematical results we presented above assumed 
that the dynamics was the overdamped Langevin Q. Therefore, some works 
have to be done to extend these results to the Langevin dynamics. There are 
mainly two difficulties. First, the infinitesimal generator associated with the 
overdamped Langevin is symmetric (in an appropriate weighted space) while 
this is not the case for Langevin, which implies that the study of the spectrum 
of the infinitesimal generator for Langevin is more complicated. In addition, 
the fact that the domain of interest is typically bounded in position but not in 
velocity implies additional difficulties. We refer to the recent work m by F. 
Nier, which gives in particular some conditions for the existence of an isolated 
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smallest eigenvalue (and therefore of the QSD) for the Fokker-Planck operator 
associated with the Langevin dynamics. Second, while there is an extensive 
literature on the spectrum of the infinitesimal generator of the overdamped 
Langevin dynamics in the small temperature regime (semi classical analysis 
for Schrodinger operators and Witten laplacians), this is not the case for the 
Langevin dynamics. This means that the analysis of the hyperdynamics or TAD 
will certainly be more involved for Langevin than for overdamped Langevin. 
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